AITopics | option policy

In the reinforcement learning context, anOption means a temporally extended sequence of actions [30],andisregarded asuseful formanypurposes, such asspeeding uplearning, transferring skills across domains, and solving long-term planning problems.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

fa246d0262c3925617b0c72bb20eeb1d-Supplemental.pdf

Neural Information Processing SystemsFeb-12-2026, 00:22:52 GMT

agent, termination, traj, (17 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games > Computer Games (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Effectively Learning Initiation Sets in Hierarchical Reinforcement Learning

Neural Information Processing SystemsDec-27-2025, 02:30:36 GMT

An agent learning an option in hierarchical reinforcement learning must solve three problems: identify the option's subgoal (termination condition), learn a policy, and learn where that policy will succeed (initiation set). The termination condition is typically identified first, but the option policy and initiation set must be learned simultaneously, which is challenging because the initiation set depends on the option policy, which changes as the agent learns. Consequently, data obtained from option execution becomes invalid over time, leading to an inaccurate initiation set that subsequently harms downstream task performance. We highlight three issues---data non-stationarity, temporal credit assignment, and pessimism---specific to learning initiation sets, and propose to address them using tools from off-policy value estimation and classification. We show that our method learns higher-quality initiation sets faster than existing methods (in MiniGrid and Montezuma's Revenge), can automatically discover promising grasps for robot manipulation (in Robosuite), and improves the performance of a state-of-the-art option discovery method in a challenging maze navigation task in MuJoCo.

hierarchical reinforcement learning, initiation, learning initiation set, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)
Information Technology > Artificial Intelligence > Robots (0.60)

Add feedback

Learning Robust Options by Conditional Value at Risk Optimization

Takuya Hiraoka, Takahisa Imagawa, Tatsuya Mori, Takashi Onishi, Yoshimasa Tsuruoka

Neural Information Processing SystemsOct-2-2025, 16:28:46 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

Scalable Option Learning in High-Throughput Environments

Henaff, Mikael, Fujimoto, Scott, Matthews, Michael, Rabbat, Michael

arXiv.org Artificial IntelligenceSep-29-2025

Hierarchical reinforcement learning (RL) has the potential to enable effective decision-making over long timescales. Existing approaches, while promising, have yet to realize the benefits of large-scale training. In this work, we identify and solve several key challenges in scaling online hierarchical RL to high-throughput environments. We propose Scalable Option Learning (SOL), a highly scalable hierarchical RL algorithm which achieves a ~35x higher throughput compared to existing hierarchical methods. To demonstrate SOL's performance and scalability, we train hierarchical agents using 30 billion frames of experience on the complex game of NetHack, significantly surpassing flat agents and demonstrating positive scaling trends. We also validate SOL on MiniHack and Mujoco environments, showcasing its general applicability. Our code is open sourced at: github.com/facebookresearch/sol.

large language model, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2509.00338

Country: North America > United States (0.46)

Genre: Research Report (0.65)

Industry: Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

Supplementary Material: Discovery of Options via Meta-Learned Subgoals A Potential negative societal impact

Neural Information Processing SystemsAug-19-2025, 00:23:43 GMT

Those tasks were quite different from the test task, which was the usual task of maximising the Atari game score.

agent, artificial intelligence, machine learning, (19 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games > Computer Games (0.71)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Study of Value-Aware Eigenoptions

Kotamreddy, Harshil, Machado, Marlos C.

arXiv.org Machine LearningJul-15-2025

Options, which impose an inductive bias toward temporal and hierarchical structure, offer a powerful framework for reinforcement learning (RL). While effective in sequential decision-making, they are often handcrafted rather than learned. Among approaches for discovering options, eigenoptions have shown strong performance in exploration, but their role in credit assignment remains underexplored. In this paper, we investigate whether eigenoptions can accelerate credit assignment in model-free RL, evaluating them in tabular and pixel-based gridworlds. We find that pre-specified eigenoptions aid not only exploration but also credit assignment, whereas online discovery can bias the agent's experience too strongly and hinder learning. In the context of deep RL, we also propose a method for learning option-values under non-linear function approximation, highlighting the impact of termination conditions on performance. Our findings reveal both the promise and complexity of using eigenoptions, and options more broadly, to simultaneously support credit assignment and exploration in reinforcement learning.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

2507.09127

Country:

North America > Canada > Alberta (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Effectively Learning Initiation Sets in Hierarchical Reinforcement Learning

Neural Information Processing SystemsJan-20-2025, 01:28:37 GMT

An agent learning an option in hierarchical reinforcement learning must solve three problems: identify the option's subgoal (termination condition), learn a policy, and learn where that policy will succeed (initiation set). The termination condition is typically identified first, but the option policy and initiation set must be learned simultaneously, which is challenging because the initiation set depends on the option policy, which changes as the agent learns. Consequently, data obtained from option execution becomes invalid over time, leading to an inaccurate initiation set that subsequently harms downstream task performance. We highlight three issues---data non-stationarity, temporal credit assignment, and pessimism---specific to learning initiation sets, and propose to address them using tools from off-policy value estimation and classification. We show that our method learns higher-quality initiation sets faster than existing methods (in MiniGrid and Montezuma's Revenge), can automatically discover promising grasps for robot manipulation (in Robosuite), and improves the performance of a state-of-the-art option discovery method in a challenging maze navigation task in MuJoCo.

hierarchical reinforcement learning, initiation, learning initiation set, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Filters

Collaborating Authors

option policy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

e8da56eb93676e8f60ed2b696e44e7dc-Paper-Conference.pdf

Effectively Learning Initiation Sets in Hierarchical Reinforcement Learning

Learning Robust Options by Conditional Value at Risk Optimization

fa246d0262c3925617b0c72bb20eeb1d-Supplemental.pdf

Effectively Learning Initiation Sets in Hierarchical Reinforcement Learning

Learning Robust Options by Conditional Value at Risk Optimization

Scalable Option Learning in High-Throughput Environments

Supplementary Material: Discovery of Options via Meta-Learned Subgoals A Potential negative societal impact

A Study of Value-Aware Eigenoptions

Effectively Learning Initiation Sets in Hierarchical Reinforcement Learning